Pub Quiz

Create your team

  • You can be from 3 to 5 people
  • Get a blue and a red pen and a sheet of paper
  • Write at the top of the paper:
    • Your team name
    • The names of the members of your team

How it works?

Questions

  • There will be 2 rounds
  • Each round has one question of each of the next categories:
    • What's the output of the script?
    • PyData community
    • Which library is it?
    • Monty Python
    • Versions
    • Barcelona community
    • Data science theory
    • Performance
    • Exceptions
    • Data science celebrities
  • For each answer write:
    • The round
    • The question
    • In the format R1Q1
  • consider the latest version of CPython and libraries
  • You can't check your phones, tablets, laptops... during the rounds

How it works

Answers

  • At the end of each round, we'll give the answers
  • Give your paper with the answers to the team next to you
  • Write in red the points:
    • 0 for wrong answers
    • 1 for right answers
    • 3 for exceptionally good answers (we'll let you know)
  • At the end, add up all the answers, and report the total

Example

R9Q9 - Which library is it?

import who_am_i as wai


df = wai.read_csv('some_clean_data.csv',
                  parse_dates=True,
                  index_col='id')
df.describe()
df['val'].plot()

Write in your paper:

  • R9Q9: pandas

Points to other teams:

  • R9Q9: pandas 1

Round 0

R0Q0: What's the output?

>>> first, *middle, last = range(10)
>>> other = [last] * 8
>>> sum(zip(middle, other), tuple())[::2]

R0Q1: PyData community

Who is the creator of Pandas, and author of the book Python for Data Analysis?

R0Q2: Which library is it?

>>> import who_am_i as wai
>>> import numpy as np
>>>
>>> t = np.arange(0.0, 2.0, 0.01)
>>> s = 1 + np.sin(2 * np.pi * t)
>>> wai.plot(t, s)
>>> wai.xlabel('Some numbers')
>>> wai.ylabel('Sine')
>>> wai.title('Sine of some numbers')
>>> wai.grid(True)
>>> wai.show()

R0Q3: Monty Python

From which Monty Python movie is this frame?

R0Q4: Versions

Since which version of Python this is valid Python code?

>>> earth_radius = 6_371
>>> print(f'The radius of earth is {earth_radius}')

R0Q5: Barcelona community

How many meetups, the Barcelona Python Meetup has held until today?

R0Q6: Data science theory

In a ROC curve plot, what is the Y axis?

R0Q7: Performance

Sort them from faster to slower:

>>> import time
>>> import random
>>> import array
>>> import numpy as np
>>> import pandas as pd
>>> 
>>> rand_list = [random.randint(0, 2 ** 32) for i in range(1_000_000)]
>>> rand_tuple = tuple(rand_list)
>>> rand_array = array.array('L', rand_list)
>>> rand_numpy = np.array(rand_list, dtype=np.int32)
>>> rand_pandas = pd.Series(rand_list)
>>> 
>>> %timeit sum(rand_list)
>>> %timeit sum(rand_tuple)
>>> %timeit sum(rand_array)
>>> %timeit rand_numpy.sum()
>>> %timeit rand_pandas.sum()

R0Q8: Celebrities

  • Born in NYC in 1928
  • Degree in physics from the California Institute of Technology, 1949
  • Ph.D. from University of California, Berkeley in 1954
  • Worked for UNESCO in Liberia as a statistician to find out how many students were in the country's schools
  • Member of the United States National Academy of Science
  • Professor emeritus of statistics at the UC Berkeley
  • Author of the 2001 paper Statistical Modeling: the Two Cultures
  • Coined the term bagging to describe bootstrap aggregation
  • Awarded the SIGKDD Data Mining and Knowledge Discovery Innovation Award
  • His best known work is considered to be Classification and Regression Trees
  • Was the first to publish about Random Forests

R0Q9: Exceptions

In which line this code raises an exception, and which is the exception (e.g. ValueError, TypeError...)?

In[1]: foo = {True, False}

In[2]: foo = {a: not a for a in foo}
In[3]: bar = foo[0]

In[4]: foo = {a: foo for a in foo}
In[5]: bar = foo[0][0]

In[6]: foo = {(a, b): foo for a in foo for b in foo}
In[7]: bar = foo[0, 0][0][0]

In[8]: foo = {(a, b): foo for a in foo.keys() for b in foo.keys()}
In[9]: bar = foo[(0, 0), (0, 0)][0, 0][0][0]

In[10]: foo = {(a, b): foo for a in foo.values() for b in foo.values()}
In[11]: bar = foo[[0, 0], [0, 0]][0, 0][0][0]

In[12]: foo = {(a, b): foo for a in foo.items() for b in foo.items()}
In[13]: bar = foo[[0, 0, 0, 0]][0, 0][0][0]

Round 0: Answers

R0Q0: What's the output?

>>> first, *middle, last = range(10)
>>> other = [2] * 8
>>> sum(zip(middle, other), tuple())[::2]

(1, 2, 3, 4, 5, 6, 7, 8) → 3

[1, 2, 3, 4, 5, 6, 7, 8] → 1

R0Q0: What's the output?

>>> first, *middle, last = range(10)
>>> first
0
>>> middle
[1, 2, 3, 4, 5, 6, 7, 8]
>>> last
9

>>> other = [last] * 8
>>> other
[9, 9, 9, 9, 9, 9, 9, 9]

>>> zip_val = list(zip(middle, other))
>>> zip_val
[(1, 9), (2, 9), (3, 9), (4, 9), (5, 9), (6, 9), (7, 9), (8, 9)]

>>> sum_val = sum(zip_val, tuple())
>>> sum_val
(1, 9, 2, 9, 3, 9, 4, 9, 5, 9, 6, 9, 7, 9, 8, 9)

>>> sum_val[::2]
(1, 2, 3, 4, 5, 6, 7, 8)

>>> sum(zip(middle, other), tuple())[::2]
(1, 2, 3, 4, 5, 6, 7, 8)

R0Q1: PyData community

Who is the creator of Pandas, and author of the book Python for Data Analysis?

Wes McKinney → 3

With wrong spelling → 1

R0Q2: Which library is it?

>>> import who_am_i as wai
>>> import numpy as np
>>>
>>> t = np.arange(0.0, 2.0, 0.01)
>>> s = 1 + np.sin(2 * np.pi * t)
>>> wai.plot(t, s)
>>> wai.xlabel('Some numbers')
>>> wai.ylabel('Sine')
>>> wai.title('Sine of some numbers')
>>> wai.grid(True)
>>> wai.show()

matplotlib.pyplot → 3

matplotlib → 1

R0Q2: Which library is it?

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>>
>>> t = np.arange(0.0, 2.0, 0.01)
>>> s = 1 + np.sin(2 * np.pi * t)
>>> plt.plot(t, s)
>>> plt.xlabel('Some numbers')
>>> plt.ylabel('Sine')
>>> plt.title('Sine of some numbers')
>>> plt.grid(True)
>>> plt.show()

R0Q3: Monty Python

From which Monty Python movie is this frame?

Monty Python and the Holy Grail → 3

Los caballeros de la mesa cuadrada → 3

R0Q4: Versions

Since which version of Python this is valid Python code?

>>> earth_radius = 6_371
>>> print(f'The radius of earth is {earth_radius}')

CPython 3.6 → 3

Returns 'The radius of earth is 6371' in Python 3.6, and SyntaxError in all previous version, because of the underscore in the number literal, and the f-string.

R0Q5: Barcelona community

How many meetups, the Barcelona Python Meetup has held until today?

Exactly 100 → 3

Between 80 and 120 → 1

R0Q6: Data science theory

In a ROC curve plot, what is the Y axis?

True positive rate → 3

Sensitivity → 3

Recall → 3

Probability of detection → 3

$\frac{Correctly\ predicted\ true}{Total\ true}$

R0Q6: Data science theory

R0Q7: Performance

Sort them from faster to slower:

>>> import time
>>> import random
>>> import array
>>> import numpy as np
>>> import pandas as pd
>>> 
>>> rand_list = [random.randint(0, 2 ** 32) for i in range(1_000_000)]
>>> rand_tuple = tuple(rand_list)
>>> rand_array = array.array('L', rand_list)
>>> rand_numpy = np.array(rand_list, dtype=np.int32)
>>> rand_pandas = pd.Series(rand_list)
>>> 
>>> %timeit sum(rand_list)  # 18.8 ms ± 251 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit sum(rand_tuple)  # 18.4 ms ± 289 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit sum(rand_array)  # 46 ms ± 1.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit rand_numpy.sum()  # 1.1 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit rand_pandas.sum()  # 1.37 ms ± 3.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

numpy, pandas, tuple, list, array → 3

If numpy was first → 1

R0Q8: Celebrities

  • Born in NYC in 1928
  • Degree in physics from the California Institute of Technology, 1949
  • Ph.D. from University of California, Berkeley in 1954
  • Worked for UNESCO in Liberia as a statistician to find out how many students were in the country's schools
  • Member of the United States National Academy of Science
  • Professor emeritus of statistics at the UC Berkeley
  • Author of the 2001 paper Statistical Modeling: the Two Cultures
  • Coined the term bagging to describe bootstrap aggregation
  • Awarded the SIGKDD Data Mining and Knowledge Discovery Innovation Award
  • His best known work is considered to be Classification and Regression Trees
  • Was the first to publish about Random Forests

Leo Breiman → 3

R0Q8: Celebrities

Leo Breiman

The one in the left

R0Q9: Exceptions

In which line this code raises an exception, and which is the exception (e.g. ValueError, TypeError...)?

In[1]: foo = {True, False}

In[2]: foo = {a: not a for a in foo}
In[3]: bar = foo[0]
In[0]: foo
{False: True, True: False}

In[4]: foo = {a: foo for a in foo}
In[5]: bar = foo[0][0]
In[0]: foo
{False: {False: True, True: False}, True: {False: True, True: False}}

R0Q9: Exceptions

In which line this code raises an exception, and which is the exception (e.g. ValueError, TypeError...)?

In[6]: foo = {(a, b): foo for a in foo for b in foo}
In[7]: bar = foo[0, 0][0][0]

In[8]: foo = {(a, b): foo for a in foo.keys() for b in foo.keys()}
In[9]: bar = foo[(0, 0), (0, 0)][0, 0][0][0]

In[10]: foo = {(a, b): foo for a in foo.values() for b in foo.values()

TypeError                                 Traceback (most recent call last)
<ipython-input-49-56010e2be892> in <dictcomp>(.0)
      9 bar = foo[(0, 0), (0, 0)][0, 0][0][0]

---> 10 foo = {(a, b): foo for a in foo.values() for b in foo.values()}
     11 bar = foo[[0, 0], [0, 0]][0, 0][0][0]

TypeError: unhashable type: 'dict'

TypeError in line 10 → 3

Round 1

R1Q0: What's the output?

>>> import itertools
>>> import random
>>> import bisect
>>> 
>>> candidates = [0, 99, 666, 22]
>>> weights = [.33, .33, 99., .34]
>>> cum_weights = itertools.accumulate(weights)
>>> candidates[bisect.bisect(cum_weights, random.random() * cum_weights[-1])]

R1Q1: Community

Which of these projects is not a NumFOCUS fiscally sponsored project?

R1Q2: Which library is it?

>>> import who_am_i as wai
>>> import numpy as np
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.dummy import DummyClassifier
>>>
>>> X = np.array([[ 1., -1.,      2.],
...               [ 2.,  np.nan,  0.],
...               [ 0.,  1.,     -1.]])
>>> y = np.array([0., 1., 0.])
>>>
>>> clf = make_pipeline(wai.Imputer(),
...                     wai.StandardScaler(),
...                     DummyClassifier())
>>>
>>> clf.fit(X, y)
>>> clf.predict(np.array([[2., 0., -1.]]))

R1Q3: Monty Python

Which is the biggest enemy of the People's Front of Judea?

R1Q4: Versions

Which is the latest released version of Pandas?

R1Q5: Barcelona community

Who was the founder of the Barcelona Python Meetup?

R1Q6: Data science theory

Given a dataset where the number of samples is much larger than the number of features, and these are continuous, which of the next models should require less parameters:

  • k-Nearest Neighbors
  • Logistic Regression
  • Decision Tree (assume min_samples_split close to 0)
  • Multilayer perceptron
  • Linear Support Vector Machine
  • Random Forest (assume trees are not pruned)
  • Deep Neural Network

R1Q7: Performance

Sort them from faster to slower:

>>> import random
>>> import numpy as np
>>>
>>> rand_list = [random.randint(0, 2 ** 32) for i in range(1_000_000)]
>>>
>>> %timeit list(map(lambda x: x ** 2, rand_list))
>>> %timeit [x ** 2 for x in rand_list]
>>> %timeit np.array(rand_list) ** 2

R1Q8: Celebrities

  • Born in Solingen, West Germany, in 1967
  • PhD (summa cum laude) in 1995 in computer science and statistics
  • Afterwards, he joined Carnegie Mellon University (CMU) as a research computer scientist
  • Co-director of the Robot Learning Laboratory at CMU
  • Director of the Stanford Artificial Intelligence Lab since 2004
  • His team won the DARPA Grand Challenge in 2005, making a self-driving car cross 240kms in the Mojave Desert
  • Became a Google fellow in 2011, where he co-developed Google Street View
  • Made their Stanford classes in AI available online together with Peter Norvig (www.ai-class.com)
  • Founder of Udacity, a platform for open education created in 2011

R1Q9: Exceptions

In which line this code raises an exception, and which is the exception (e.g. ValueError, TypeError...)?

In[1]: import pandas as pd

In[2]: df = pd.DataFrame({'value': [1, 2, 3]},
  ...:                   index=[10, 20, 30])

In[3]: df.loc[[10, 20, 30]] + df.iloc[[0, 1, 2]] + df.ix[[0, 1, 2]]

Round 1: Answers

R1Q0: What's the output?

>>> import itertools
>>> import random
>>> import bisect
>>> 
>>> candidates = [0, 99, 666, 22]
>>> weights = [.33, .33, 99., .34]
>>> cum_weights = list(itertools.accumulate(weights))
>>> cum_weights
[0.33, 0.66, 99.66, 100.0]
>>> random_number = random.random() * cum_weights[-1]
>>> # uniformly distributed between 0 and 100
>>> # 99% probability of being between 0.66 and 99.66
>>> random_number
58.95397201789591
>>> index = bisect.bisect(cum_weights, random_number)
>>> index
2
>>> candidates[index]
666

R1Q0: What's the output?

>>> import itertools
>>> import random
>>> import bisect
>>> 
>>> candidates = [0, 99, 666, 22]
>>> weights = [.33, .33, 99., .34]
>>> cum_weights = itertools.accumulate(weights)
>>> candidates[bisect.bisect(cum_weights, random.random() * cum_weights[-1])]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-76cc035203d4> in <module>()
      6 weights = [.33, .33, 99., .34]
      7 cum_weights = itertools.accumulate(weights)
----> 8 candidates[bisect.bisect(cum_weights, random.random() * cum_weights[-1])]

TypeError: 'itertools.accumulate' object is not subscriptable

TypeError → 3

666 with 99% probability → 1

R1Q1: PyData community

Which of these projects is not a NumFOCUS fiscally sponsored project?

R1Q1: PyData community

Which of these projects is not a NumFOCUS fiscally sponsored project?

Gensim → 3

R1Q2: Which library is it?

>>> import who_am_i as wai
>>> import numpy as np
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.dummy import DummyClassifier
>>>
>>> X = np.array([[ 1., -1.,      2.],
...               [ 2.,  np.nan,  0.],
...               [ 0.,  1.,     -1.]])
>>> y = np.array([0., 1., 0.])
>>>
>>> clf = make_pipeline(wai.Imputer(),
...                     wai.StandardScaler(),
...                     DummyClassifier())
>>>
>>> clf.fit(X, y)
>>> clf.predict(np.array([[2., 0., -1.]]))

sklearn.preprocessing → 1

R1Q2: Which library is it?

>>> import sklearn.preprocessing
>>> import numpy as np
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.dummy import DummyClassifier
>>>
>>> X = np.array([[ 1., -1.,      2.],
...               [ 2.,  np.nan,  0.],
...               [ 0.,  1.,     -1.]])
>>> y = np.array([0., 1., 0.])
>>>
>>> clf = make_pipeline(sklearn.preprocessing.Imputer(),
...                     sklearn.preprocessing.StandardScaler(),
...                     DummyClassifier())
>>>
>>> clf.fit(X, y)
>>> clf.predict(np.array([[2., 0., -1.]]))
array([ 1.])

if you also knew the prediction is 1. → 3

R1Q3: Monty Python

Which is the biggest enemy of the People's Front of Judea?

The Judean People's Front → 3

The Romans → -1

R1Q4: Versions

Which is the latest released version of Pandas?

pandas 0.20.1 → 3

pandas 0.20 → 1

R1Q5: Barcelona community

Who was the founder of the Barcelona Python Meetup?

Maik Röder → 5

R1Q6: Data science theory

Given a dataset where the number of samples is much larger than the number of features, which of the next models should require less parameters. The features are continuos, and there is no pattern in the features to discriminate the response variable.

  • k-Nearest Neighbors
  • Logistic Regression
  • Decision Tree (assume min_samples_split close to 0)
  • Multilayer perceptron (hidden units is larger than features)
  • Linear Support Vector Machine
  • Random Forest (assume trees are not pruned)
  • Deep Neural Network

Logistic Regression and Linear Support Vector Machine → 3

Just one of them → 1

R1Q7: Performance

Sort them from faster to slower:

>>> import random
>>> import numpy as np
>>>
>>> rand_list = [random.randint(0, 2 ** 32) for i in range(1_000_000)]
>>>
>>> %timeit list(map(lambda x: x ** 2, rand_list))  # 589 ms ± 6.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit [x ** 2 for x in rand_list]  # 478 ms ± 1.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.array(rand_list) ** 2  # 117 ms ± 21.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

numpy → 3

R1Q8: Celebrities

  • Born in Solingen, West Germany, in 1967
  • PhD (summa cum laude) in 1995 in computer science and statistics
  • Afterwards, he joined Carnegie Mellon University (CMU) as a research computer scientist
  • Co-director of the Robot Learning Laboratory at CMU
  • Director of the Stanford Artificial Intelligence Lab since 2004
  • His team won the DARPA Grand Challenge in 2005, making a self-driving car cross 240kms in the Mojave Desert
  • Became a Google fellow in 2011, where he co-developed Google Street View
  • Made their Stanford classes in AI available online together with Peter Norvig (www.ai-class.com)
  • Founder of Udacity, a platform for open education created in 2011

Sebastian Thrun → 3

R1Q9: Exceptions

In which line this code raises an exception, and which is the exception (e.g. ValueError, TypeError...)?

In[1]: import pandas as pd

In[2]: df = pd.DataFrame({'value': [1, 2, 3]},
  ...:                   index=[10, 20, 30])

In[3]: df.loc[[10, 20, 30]] + df.iloc[[0, 1, 2]] + df.ix[[0, 1, 2]]

R1Q9: Exceptions

In which line this code raises an exception, and which is the exception (e.g. ValueError, TypeError...)?

In[1]: import pandas as pd

In[2]: df = pd.DataFrame({'value': [1, 2, 3]},
  ...:                   index=[10, 20, 30])

In[3]: df.loc[[10, 20, 30]] + df.iloc[[0, 1, 2]] + df.ix[[0, 1, 2]]
DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix
  #!~/.anaconda3/bin/python
Out[3]: 
    value
0     NaN
1     NaN
2     NaN
10    NaN
20    NaN
30    NaN

No exception but DeprecationWarning → 3

No exception → 1


In [ ]: